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Abstract 

Geometric  considerations  suggest  that  the  problem  of  estimating  a  system’s  three-dimensional  (3D) 
motion  from  a  sequence  of  images,  which  has  puzzled  researchers  in  the  fields  of  Computational  Vision  and 
Robotics  as  well  as  the  Biological  Sciences,  can  be  addressed  as  a  pattern  recognition  problem.  Information 
for  constructing  the  relevant  patterns  is  found  in  spatial  arrangements  or  gratings,  that  is,  aggregations  of 
orientations  along  which  retinal  motion  information  is  estimated.  The  exact  form  of  the  gratings  is  defined 
by  the  shape  of  the  retina  or  imaging  surface;  for  a  planar  retina  they  are  radial  lines,  concentric  circles,  as 
well  as  elliptic  and  hyperbolic  curves,  while  for  a  spherical  retina  they  become  longitudinal  and  latitudinal 
circles  for  various  axes.  Considering  retinal  motion  information  computed  normal  to  these  gratings,  patterns 
are  found  that  have  encoded  in  their  shape  and  location  on  the  retina  subsets  of  the  3D  motion  parameters. 
The  importance  of  these  patterns  is  first  that  they  depend  only  on  the  3D  motion  and  not  on  the  scene 
in  view,  thus  providing  globally  a  separation  of  the  effects  of  3D  motion  and  scene  structure  on  the  image 
motion,  and  second  that  they  are  founded  upon  easily  derivable  image  measurements— they  do  not  utilize 
exact  retinal  motion  meeisurements  such  as  optical  flow,  but  only  the  sign  of  image  motion  along  a  set 
of  directions  defined  by  the  gratings.  The  computational  theory  presented  in  this  paper  explains  how  the 
self-motion  of  a  system  can  be  estimated  by  locating  these  patterns.  We  also  conjecture  that  this  theory  or 
variations  of  it  might  be  implemented  in  nature  and  call  for  experiments  in  the  neurosciences. 
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To  live  is  to  move  and  to  move  is  to  live.  All  animals  exist  in  space-time;  they  move  in 
their  environments  and  interact  with  it.  Not  surprisingly,  to  detect  the  sensory  effects  of 
movement  is  the  first  task  of  all  sensory  systems;  and  to  reach  an  understanding  of  movement 
is  a  primary  goal  of  all  later  perceptual  analysis  [3].  Although  the  organism  as  a  whole  might 
move  in  a  nonrigid  manner — head,  arms,  legs  and  wings  undergo  different  motions — ^the  eyes 
move  rigidly,  i.e.,  as  a  sum  of  instantaneous  translation  and  rotation.  The  fundamental, 
abstract  geometric  concept  used  to  describe  the  computational  analysis  of  visual  motion 
is  that  of  the  two-dimensional  motion  field:  As  a  system  moves  in  its  environment,  every 
point  of  the  environment  has  a  velocity  vector  with  respect  to  the  system.  The  projection 
of  these  three  dimensional  velocity  vectors  on  the  retina  of  the  system’s  eye  constitutes  the 
so-called  motion  field.  This  field  depends  on  the  3D  motion  and  the  structure  of  the  scene  in 
view.  Considering  a  spherical  eye  moving  with  a  translation  t,  the  motion  field  is  along  the 
great  circles  containing  the  vector  t  (Figure  la),  pointing  away  from  the  Focus  of  Expansion 
(FOE)  and  towards  the  Focus  of  Contraction  (FOC).  (The  FOE  and  FOC  are  the  points 
where  t  cuts  the  image  sphere.)  If  the  motion  of  the  eye  is  a  rotation  of  velocity  w  where 
the  rotation  axis  cuts  the  sphere  at  points  AOR  (the  Axis  of  Rotation  point)  and  -AOR, 
the  motion  field  is  along  the  circles  resulting  from  the  intersection  of  the  sphere  with  planes 
perpendicular  to  the  rotation  axis  (Figure  lb).  For  general  rigid  motion  the  motion  field  on 
the  sphere  is  the  addition  of  a  translational  field  and  a  rotational  field  (Figure  Ic).  In  this 
case,  the  motion  field  does  not  have  a  simple  structure  and  it  becomes  difficult  to  locate 
the  points  FOE  and  AOR,  i.e.,  to  solve  the  problem  of  egomotion  using  the  two-dimensional 
motion  field  as  input. 

The  problem  is  even  more  difficult,  since  what  can  be  derived  from  the  sequence  of 
images  sensed  by  the  moving  retina  is  not  the  exact  projection  of  the  3D  motion  field  but 
only  information  about  the  movement  of  light  patterns.  In  the  literature  the  exact  movement 
of  every  point  on  the  image  is  termed  the  optical  flow  field.  In  general,  accurate  values  of  the 
optical  flow  field  are  not  computable.  On  the  basis  of  local  information  only  the  component 
of  the  optical  flow  perpendicular  to  edges,  the  so-called  normal  flow,  is  well-defined  (the 
aperture  problem;  see  Figure  2).  In  many  cases,  it  is  possible  to  obtain  additional  flow 
information  for  areas  (patches)  in  the  image.  Thus,  the  input  that  any  system  can  use  for 
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Figure  1:  Motion  fields  on  a  spherical  retina.  The  image  r  of  a  scene  point  with  position 
vector  R  (with  respect  to  an  orthonormal  coordinate  system  fixed  to  the  center  0  of  the 
sphere)  is  formed  by  perspective  projection  through  center  0.  The  sphere  undergoes  a  rigid 
motion  with  translational  velocity  t  and  rotational  velocity  u.  (a)  Translational  motion 
field:  at  every  point  f  the  motion  vector  is  —  I],  with  |.R|  being  the  length  of 

R  and  denoting  inner  vector  product.  Thus  it  is  parallel  to  the  great  circle  passing 
through  the  FOE  and  the  FOC  and  its  value  is  inversely  proportional  to  the  distance  to  the 
corresponding  scene  point,  (b)  Rotational  motion  field:  at  every  point  r  the  motion  vector 
is  —u  X  r,  where  “x”  denotes  outer  vector  product.  Thus  it  is  parallel  to  the  circle  passing 
through  f  perpendicular  to  tD,  and  it  does  not  depend  on  the  scene  in  view,  (c)  General  rigid 
motion  field:  at  every  point  r  the  motion  vector  is  i^[(t  •  r)f  -  i\-io  x  f. 
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further  motion  processing  is  partial  optical  flow  information.  The  following  analysis  is  based 
on  a  minimum  amount  of  knowledge  about  image  motion,  namely  the  sign  of  the  projection  of 
optical  flow  along  directions  where  it  can  be  robustly  computed.  These  measurements  along 
a  set  of  appropriately  chosen  orientations  possess  a  rich  global  structure — they  form  simple 
patterns  on  the  image  surface  whose  location  and  form  encodes  the  3D  motion  parameters. 
A  definition  of  the  selected  directions  is  given  below. 


Figure  2:  (a):  Line  feature  observed  through  a  small  aperture  at  time  t.  (b):  At  time  t  +  St 
the  feature  has  moved  to  a  new  position.  It  is  not  possible  to  determine  exactly  where  each 
point  has  moved  to.  From  local  measurements,  only  the  flow  component  perpendicular  to 
the  line  feature  can  be  computed. 


1  Selection  of  Flow  Directions 

Two  classes  of  orientations  are  introduced  which  are  defined  with  regard  to  an  axis.  Consider 
an  axis  s  passing  from  the  center  of  a  spherical  eye  and  cutting  the  sphere  at  points  N  and 
S.  The  unit  vectors  tangential  to  the  great  circles  containing  s  define  a  direction  for  every 
point  on  the  retina  (Figure  3a).  These  directions  are  called  s-longitudinal.  Similarly,  the 
s-latitudinal  directions  are  defined  as  the  unit  vectors  tangential  to  the  circles  resulting  from 
the  intersections  of  the  sphere  with  planes  perpendicular  to  the  axis  s  (Figure  3b).  At  each 
point  the  s-longitudinal  and  latitudinal  vectors  are  perpendicular  to  each  other. 

Some  properties  of  these  directions  will  be  of  use  later:  Consider  two  axes  si  [NiSi)  and 
5*2  (^252)-  Each  axis  defines  at  every  point  a  longitudinal  and  a  latitudinal  direction.  The 
locus  of  points  on  the  sphere  where  the  si -longitudinal  directions  are  perpendicular  to  the 
S2-longitudinal  directions  (or  where  the  si  latitudinal  directions  are  perpendicular  to  the 
5*2  latitudinal  directions)  constitutes  two  quadratic  curves  whose  geometry  is  explained  in 
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(a)  (b) 


Figure  3:  (a)  A  longitudinal  vector  field  defined  by  axis  $.  At  every  point  r,  a  longitudinal 
vector  has  direction  {s  ■  f)f  -  s.  (b)  A  latitudinal  vector  field  defined  by  axis  s.  At  every 
point  r,  a  latitudinal  vector  has  direction  —s  x  f. 

Figure  4a.  Similarly,  the  longitudinal  directions  of  one  axis  and  the  latitudinal  directions  of 
the  other  axis  are  perpendicular  to  each  other  along  the  great  circle  defined  by  Si  and  5*2 
(Figure  4b). 

The  structure  of  the  projections  of  a  rigid  motion  field  on  the  s-longitudinal  and  s- 
latitudinal  vectors  will  next  be  examined.  More  precisely,  the  signs  of  the  projections  of  the 
motion  field  on  the  longitudinal  and  latitudinal  vectors  will  be  investigated,  since  this  will 
be  the  information  employed  as  input  to  the  motion  interpretation  process.  For  this  purpose 
it  is  necessary  to  agree  upon  a  definition  of  the  signs,  s  (A^5')-longitudinal  vectors  are  called 
positive  (+),  if  they  point  away  from  iV,  negative  (-)  if  they  point  away  from  5,  and  zero 
(0)  otherwise.  Similarly,  s-latitudinal  vectors  are  referred  to  as  positive  (+)  if  their  direction 
is  counterclockwise  with  respect  to  s,  negative  (-)  if  their  direction  is  clockwise,  and  zero 
(0)  otherwise  (Figure  5). 
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(a)  (b) 


Figure  4:  (a)  On  the  sphere,  the  great  circles  containing  Si  and  J2  are  perpendicular  to 
each  other  on  two  closed  second  order  curves,  whose  form  depends  on  the  angle  between  si 
and  S2.  These  curves  are  defined  as  the  set  of  points  r  for  which  (si  x  f)  •  (s2  x  f)  =  0  or 
(si  •  r)(s2  -f)  —  $1-  S2.  (b)  The  si-longitudinal  vectors  are  perpendicular  to  the  S2-latitudinal 
vectors  along  the  great  circle  through  Ji  and  S2,  defined  as  (si  x  J2)  •  r  =  0. 

2  The  Geometry  of  Image  Motion  Patterns 

Since  a  rigid  motion  field  is  the  addition  of  a  translational  and  a  rotational  field,  the  cases 
of  pure  translation  and  pure  rotation  are  first  presented  separately. 

If  the  observer  moves  with  a  pure  translation  of  velocity  t,  the  motion  field  on  the  sphere 
is  along  the  direction  of  the  ^^longitudinal  vectors  (Figure  la).  Projecting  the  translational 
motion  field  of  Figure  la  on  the  s-longitudinal  vectors  of  Figure  3a,  the  resulting  vectors 
will  be  either  zero,  positive  or  negative.  The  vectors  will  be  zero  on  two  curves  as  shown  in 
Figure  4a  (symmetric  around  the  center  of  the  sphere)  whose  shape  depends  on  the  angle 
between  the  vectors  t  and  s.  The  area  inside  the  curves  will  contain  negative  vectors  and 
the  area  outside  the  curves  will  contain  positive  vectors  (Figure  6a). 

If  the  observer  moves  purely  rotationally  with  velocity  uj,  the  motion  field  on  the  sphere  is 
along  the  direction  of  the  oJ-latitudinal  vectors  (Figure  lb).  Projecting  the  rotational  motion 
field  of  Figure  lb  on  the  s  (A^5)-longitudinal  vectors  of  Figure  3a,  the  resulting  vectors  will 
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Figure  5:  Positive  and  negative  longitudinal  (a)  and  latitudinal  (b)  image  motion  measure¬ 
ments.  The  input  employed  in  the  motion  interpretation  process  is  the  sign  of  the  image 
motion’s  value  in  the  longitudinal  and  latitudinal  directions. 

be  either  zero,  positive  or  negative.  The  projections  will  be  zero  on  the  great  circle  defined 
by  s  and  u5,  positive  in  one  hemisphere  and  negative  in  the  other  (Figure  6b). 

If  the  observer  translates  and  rotates  with  velocities  t  and  u  the  projection  of  the  general 
motion  field  on  any  set  of  5-longitudinal  vectors  can  be  classified  for  parts  of  the  image.  If  at 
a  longitudinal  vector  the  projections  of  both  the  translational  and  the  rotational  vectors  are 
positive,  then  the  projection  of  the  image  motion  (the  sum  of  the  translational  and  rotational 
vectors)  will  also  be  positive.  Similarly,  if  the  projections  of  both  the  translational  and 
rotational  vectors  on  a  longitudinal  vector  are  negative,  the  projection  of  the  motion  vector  at 
this  point  will  also  be  negative.  In  other  words,  if  the  values  of  Figures  6a  and  6b  are  added, 
whenever  positive  and  positive  come  together,  the  result  will  be  positive,  and  whenever 
negative  and  negative  come  together,  the  result  will  be  negative.  However,  whenever  positive 
and  negative  come  together,  the  result  cannot  be  determined  without  knowledge  of  the 
environment.  In  such  a  case  the  sign  of  the  projection  of  the  rigid  motion  vector  depends 
on  the  values  of  the  translational  and  rotational  vector  components  and  thus  on  the  lengths 
of  the  vectors  t  and  a;  and  the  depth  of  the  scene.  (Actually,  this  “don’t  know”  area  also 
contains  rich  information  regarding  3D  motion  and  structure  [4].) 
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Figure  6:  5-longitudinal  pattern,  (a)  At  every  point  r  the  projection  of  the^translat^onal 
motion  vector  on  the  5-longitudinal  vector  is 

It  is  zero  on  the  curves  s*t  =  {s-f){t  •  f)  (as  shown  in  Figure  4a),  negative  inside  the  curves 
and  positive  outside  the  curves,  (b)  At  every  point  f  the  projection  of  the  rotational  motion 
vector  on  the  5- longitudinal  vector  is  —  (a;  x  ^  *  [(5  •  —  5]  =  (5  x  a;)  •  f.  It  is  zero  on  the 

great  circle  (5  x  cD)  •  r  =  0  passing  through  5  and  a3,  positive  in  one  hemisphere  and  negative 
in  the  other,  (c)  A  general  rigid  image  motion  defines  a  pattern  along  every  5-longitudinal 
vector  field:  an  area  of  negative  values,  an  area  of  positive  values,  and  an  area  of  values 
whose  signs  are  unknown  since  they  depend  on  the  scene. 


Thus,  the  distribution  of  the  sign  of  image  motion  along  any  5-longitudinal  set  of  di¬ 
rections  defines  a  pattern  on  the  sphere.  Considering  a  general  rigid  motion  field  due  to 
translation  t  and  rotation  a;  on  an  5  (A^S')-longitudinal  set  of  directions,  a  pattern  like  the 
one  shown  in  Figure  6c  is  obtained,  which  consists  of  an  area  of  strictly  positive  values, 
an  area  of  strictly  negative  values,  and  an  area  in  which  the  values  cannot  be  determined 
without  more  information.  The  pattern  is  characterized  by  one  great  circle  containing  a;  and 
5  and  by  two  quadratic  curves  containing  the  points  FOE,  FOC,  N  and  S. 

It  is  worth  stressing  that  the  pattern  of  Figure  6c  is  independent  of  the  scene  in  view 
and  depends  only  on  a  subset  of  the  3D  motion  parameters.  In  particular,  the  great  circle  is 
defined  by  one  rotational  parameter  and  the  quadratic  curve  by  two  translational  parameters. 
Thus  the  pattern  is  of  dimension  three.  Also,  the  pattern  is  different  for  a  different  choice  of 
the  vector  5.  In  summary,  for  a  rigid  motion  {t^co)  with  any  axis  5  and  a  set  of  directions  on 
the  retina,  an  area  of  the  imaging  surface  has  been  identified  where  the  signs  of  the  motion 
vectors  along  these  directions  does  not  depend  on  the  scene  in  view! 
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Considering  the  projection  of  a  rigid  motion  field  on  the  s  latitudinal  directions  (defined 
by  the  vector  s  (NS)),  another  pattern  (see  Figure  7)  is  obtained  which  is  dual  to  the  one  of 
Figure  6c.  This  time  the  translational  latitudinal  flow  is  separated  into  positive  and  negative 
by  a  great  circle,  and  the  rotational  flow  by  two  closed  quadratic  curves  (as  in  Figure  3a) 
passing  through  the  points  AOR,  —  AOR,  N  and  S. 


Figure  7:  s-latitudinal  pattern.  The  translational  flow  is  separated  into  positive  and  negative 
areas  by  a  great  circle  and  the  rotational  flow  by  two  closed  quadratic  curves. 


3  Egomotion  Estimation  Through  Pattern  Matching 

The  geometric  analysis  described  above  allows  us  to  formulate  the  problem  of  egomotion 
estimation  as  a  pattern  recognition  problem.  Assume  that  the  system  has  the  capability 
of  estimating  the  sign  of  the  retinal  motion  along  a  set  of  directions  defined  by  various  s- 
longitudinal  or  latitudinal  fields.  If  the  system  can  locate  the  patterns  of  Figures  6c  and  7  in 
each  longitudinal  and  latitudinal  vector  field,  then  it  has  effectively  recognized  the  directions 
t  and  uj.  The  intersections  of  the  quadratic  curves  of  the  patterns  in  Figure  6c  and  the  great 
circles  of  the  patterns  in  Figure  7  provide  the  points  AOR  and  —AOR,  and  the  intersections 
of  the  great  circles  of  the  patterns  in  Figure  6c  and  the  quadratic  curves  of  the  patterns  in 
Figure  7  provide  the  points  FOE  and  FOC.  Each  single  pattern  provides  only  constraints 
on  the  locations  of  the  FOE  and  the  AOR,  but  a  collection  of  patterns  constrains  these 
locations  to  small  areas  or  even  single  points.  It  depends  on  the  computational  power  of  the 
system  how  much  information  will  be  available  for  pattern  fitting  and  thus  how  accurately 
the  FOE  and  the  AOR  can  be  localized.  If  the  system  is  able  to  derive  optical  flow,  then 
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it  is  able  to  estimate  the  sign  of  the  projection  of  the  flow  along  any  direction  and  thus  for 
every  pattern  at  every  point  information  is  available.  If,  however,  the  system  is  less  powerful 
and  can  only  compute  the  motion  in  one  direction  (normal  flow)  or  the  sign  of  the  motion 
in  a  few  directions,  then  the  solution  proceeds  exactly  as  before.  The  difference  is  that  for 
each  longitudinal  or  latitudinal  set  of  directions,  information  (positive,  negative  or  zero)  is 
not  available  at  every  point  of  the  sphere,  and  consequently  the  uncertainty  may  be  larger 
and  the  FOE  and  AOR  may  be  obtained  only  within  bounds. 

After  the  directions  of  t  and  u  are  estimated  using  the  sign  of  the  flow  along  various 
directions,  the  length  of  w,  i.e.,  the  exact  rotation,  can  easily  be  estimated  using  the  values 
of  the  flow  measurements.  Also,  after  deriving  the  3D  motion  from  the  information  supplied 
by  the  patterns,  the  system  could  estimate  optical  flow  for  the  purpose  of  deriving  3D  scene 
structure  and  estimating  the  FOE  and  AOR  more  accurately.  Usually,  in  a  working  system, 
information  from  other  senses — such  as  inertial  sensors — is  utilized  in  addition. 

4  Image  Motion  Patterns  for  a  Planar  Retina 

For  the  case  of  a  planar  retina  the  latitudinal  and  longitudinal  fields  take  a  different  form. 
There  is  a  simple  way  of  visualizing  longitudinal  and  latitudinal  fields  on  the  sphere  that 
carries  through  to  the  case  of  a  planar  retina.  Given  an  axis  s,  consider  the  family  of  cones 
with  apex  at  the  center  of  the  sphere  and  axis  s.  The  intersection  of  these  cones  with  the 
sphere  provides  circles  lying  on  planes  perpendicular  to  s.  The  unit  vectors  perpendicular 
to  the  circles  and  tangential  to  the  sphere  form  a  longitudinal  field.  The  intersection  of  the 
family  of  cones  with  a  plane  gives  rise  to  a  family  of  conic  sections  whose  form  (ellipses, 
hyperbolas,  parabolas)  depends  on  the  angle  between  the  axis  s  and  the  plane.  The  vectors 
perpendicular  to  these  conics  correspond  to  the  longitudinal  field  (Figure  8a).  On  the  sphere 
the  latitudinal  field  is  perpendicular  to  the  great  circles  containing  s.  These  great  circles 
when  projected  onto  a  plane  become  straight  lines  passing  through  a  common  point.  The 
vectors  perpendicular  to  these  lines  correspond  to  the  latitudinal  vector  field  (Figure  8b). 
The  longitudinal  fields  on  the  plane  are  of  a  simple  form  when  the  chosen  a;xis  is  parallel  to 
an  axis  of  a  Cartesian  3D  coordinate  system  attached  to  the  observer;  they  are  called  a-fields 
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(axis  parallel  to  the  x-axis),  ^-fields  (axis  parallel  to  the  y-axis)  and  7-fields  (axis  parallel 
to  the  or  optical  axis)  (Figure  9).  As  before,  the  sign  of  the  projection  of  the  motion  field 
on  these  particular  fields  possesses  a  rich  structure  as  shown  in  Figure  10. 


Figure  8:  (a)  In  the  plane  the  longitudinal  vectors  become  perpendicular  to  conic  sections 
defined  by  a  family  of  cones  with  axis  parallel  to  s.  (b)  The  s-latitudinal  vectors  become 
perpendicular  to  straight  lines  passing  through  the  intersection  so  of  s  with  the  plane. 


Figures  11,  12  and  13  show  results  from  experiments  on  synthetic  spherical  images  and 
real  planar  images  from  an  indoor  and  an  outdoor  scene. 


5  Relationship  to  Other  Computational  Approaches 

The  pattern  matching  approach  to  egomotion  estimation  does  not  directly  relate  to  tradi¬ 
tional  computational  studies  on  the  perception  of  3D  motion.  Traditional  studies,  with  a  few 
exceptions  [5,  6],  addressed  the  problem  in  two  steps.  In  the  first  step  the  optical  flow  field 
was  estimated  as  an  approximation  to  the  motion  field.  In  the  second  step,  the  3D  motion 
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Figure  9:  Positive  a,  /?  and  7  vectors.  If  the  s-axis  is  the  x,  y  or  ^-axis  the  corresponding 
longitudinal  vector  fields  are  vectors  perpendicular  to:  (a)  and  (b)  hyperbolas  whose  axes 
coincide  with  the  vertical  and  horizontal  axes  on  the  image  plane,  or  (c)  concentric  circles 
with  their  center  at  the  origin  of  the  image. 


Figure  10:  a,  /?  and  7  patterns,  (a)  Image  motion  measurements  along  the  a-vectors  form 
patterns  of  positive  and  negative  values  which  are  defined  by  a  horizontal  straight  line  and 
a  hyperbola.  (^)  /^-patterns  are  defined  by  a  vertical  straight  line  and  a  hyperbola  and 
(7)  7-patterns  are  defined  by  a  straight  line  through  the  image  center  and  a  circle  passing 
through  the  center  of  the  image.  The  intersection  of  the  straight  lines  gives  the  AOR  and 
the  intersection  of  the  conics  gives  the  FOE. 

was  estimated  through  a  local  decomposition  of  the  optical  flow  field  [7-11].  In  the  scheme 
described  here,  the  retinal  motion  information  utilized  consists  of  the  signs  of  the  optical 
flow  along  certain  directions.  In  other  words,  for  a  vector  v  on  the  image,  the  information 
needed  is  whether  the  flow  along  the  line  defined  by  v  has  the  sign  of  v  or  —v.  This  is  a 
robust  qualitative  property  of  the  optical  flow,  and  as  demonstrated  here,  it  is  sufficient  for 
the  task  of  egomotion  perception.  In  the  literature,  it  has  been  argued  [12,  13]  that  qualita¬ 
tive  estimates  of  optical  flow  are  often  sufficient  for  many  tasks;  for  instance,  for  the  task  of 
detecting  a  potential  crash  [14],  not  even  a  precise  measurement  of  the  normal  component 
of  the  flow  may  be  necessary.  As  suggested  in  [15],  “it  is  sufficient  that  the  image  motion 
estimate  be  qualitatively  consistent  with  the  perspective  2D  projection  of  the  ‘true’  3D  ve¬ 
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Figure  11:  (a)  Using  a  graphics  package  the  image  of  a  synthetic  environment  (a  glass  table 
set  up  in  front  of  a  window — notice  the  reflection  on  the  glass  and  the  sky  background) 
including  a  sphere  (a  blue  ball  in  front  of  a  cheese)  has  been  created.  From  the  resulting 
images  the  normal  flow  has  been  computed,  (b),  (c)  and  (d)  show,  overlaid  on  the  spherical 
image,  the  positive  and  negative  areas  for  three  different  patterns;  red  corresponds  to  positive 
and  green  to  negative.  The  motion  is  graphically  rendered  through  its  translation  axis  (black 
axis)  and  its  rotation  axis  (grey  axis).  The  three  axes  s  chosen  for  the  patterns  are  the  x-,  y- 
and  z-axes.  (e),  (f)  and  (g)  show  for  the  same  configuration  the  spherical  retinae  containing 
only  the  patterns  (pink  denoting  positive  areas,  turquoise  negative  areas,  and  blue  don  t- 
care”  areas). 
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Figure  12:  For  a  sequence  of  images  taken  by  a  sensor  moving  rigidly,  the  normal  flow  field  was 
estimated.  The  image  size  is  574  X  652,  the  focal  length  in  the  a;-direction  is  1163  pixels,  and  the 
focal  length  in  the  y-direction  is  1362  pixels.  The  center  of  the  image  is  (332, 305)  (measuring  from 
the  bottom  left  corner),  the  FOE  is  (255, 129),  and  the  AOR  is  (496, 160).  (a)  shows  the  first  frame 
of  the  sequence  and  (b)  shows  the  normal  flow  field  corresponding  to  the  first  and  second  frame,  (c) 
shows  the  positive  and  negative  7-vectors  found  from  the  normal  flow  field  (blue  denotes  negative 
and  red  denotes  positive)  and  (d)  shows  the  fitting  of  the  7-pattern  to  the  7-vectors  in  the  final 
stage,  after  all  patterns  have  been  computed.  Similarly,  (e)  and  (g)  show  the  positive  and  negative 
a-  and  /?- vectors,  and  (f)  and  (h)  the  fitting  of  the  o-pattern  to  the  a- vectors  and  the  /3-pattern 
to  the  /3-vectors  in  the  final  stage,  (i)  shows  the  curves  (red)  and  the  straight  lines  (blue)  of  the 
(X-,  /3-,  and  7-patterns  superimposed  on  the  image.  The  intersection  of  the  second  order  curves 
provides  the  FOE  and  the  intersection  of  the  straight  lines  gives  the  AOR. 
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Figure  13:  A  camera  mounted  on  the  Unmanned  Ground  Vehicle,  developed  by  Martin 
Marietta  Corp.  under  a  contract  with  the  U.S.  Government,  captured  a  sequence  of  images 
as  the  vehicle  moved  along  rough  terrain  in  the  countryside,  thus  undergoing  continuously 
changing  rigid  motion,  (a)  shows  one  frame  of  the  sequence  with  the  normal  flow  field  overlaid 
in  red.  (b),  (d)  and  (f)  show  the  positive  a-,  j3-  and  q-vectors  and  (c),  (e)  and  (g)  show  fitted 
o;-,  /3-  and  7-patterns  in  the  final  stage  after  all  the  patterns  have  been  computed,  (h)  shows 
superimposed  on  the  image  the  boundaries  of  the  patterns  whose  intersections  provide  the 
FOE  and  the  AOR.  (i)  Because  measurements  are  not  everywhere  available  (strong  spatial 
gradients  appear  sparse),  a  set  of  patterns  can  be  fitted  with  accuracy  above  the  threshold 
of  97%  (where  accuracy  is  defined  as  the  ratio  of  the  number  of  successfully  fitted  pixels  over 
the  total  number  of  pixels  in  the  pattern),  resulting  in  solutions  for  the  FOE  and  the  AOR 
lying  within  two  bounded  areas  (red:  FOE;  green:  AOR). 


locity  field.  Even  estimates  that  don’t  correspond  to  image  velocity,  like  the  ones  derived 
by  Reichardt’s  correlation  model  or  equivalent  energy  models  [16-19],  may  be  acceptable  for 
several  visual  tasks  if  the  estimates  are  consistent  over  the  visual  field.”  The  pattern-based 
approach  to  the  problem  of  egomotion  estimation  proves  the  feasibility  of  such  ideas  about 
qualitative  visual  motion  analysis. 

6  The  Motion  Pathway  in  Primates 

The  computational  framework  described  here  is  consistent  with  findings  in  neurobiology 
regarding  the  structure  and  functional  properties  of  neurons  in  the  visual  motion  pathway 
(Figure  14)  [20].  According  to  Movshon  (and  many  others),  in  the  early  stages,  from  the 
retinal  Pa  ganglion  cells  through  the  magnocellular  LGN  cells  to  layer  4Ca  of  VI  the  cells 
appear  functionally  homogeneous  and  respond  almost  equally  well  to  the  movement  of  a 
bar  (moving  perpendicularly  to  its  direction)  in  any  direction  (Figure  14a).  Within  layer 
4C  of  VI  an  onset  of  orientation  selectivity  is  observed.  The  receptive  fields  of  the  neurons 
here  are  divided  into  separate  excitatory  and  inhibitory  regions.  The  regions  are  arranged  in 
parallel  stripes  and  this  arrangement  provides  the  neurons  with  a  preference  for  a  particular 
orientation  of  a  bar  target  (which  is  displayed  in  the  polar  diagram)  (Figure  14b).  In  layer  4B 
of  VI  another  major  transformation  takes  place  with  the  appearance  of  directional  selectivity. 
The  receptive  fields  here  are  relatively  large  and  they  seem  to  be  excited  everywhere  by  light 
or  dark  targets.  In  addition,  these  neurons  respond  better  or  solely  to  one  direction  of  motion 
of  an  optimally  oriented  bar  target,  and  less  or  not  at  all  to  the  other  (Figure  14c).  In  MT 
neurons  have  considerably  large  receptive  fields  and  in  general  the  precision  of  the  selectivity 
for  direction  of  motion  that  the  neurons  exhibit  is  typically  less  than  in  VI  (Figure  14d).  In 
MST  the  size  of  the  receptive  fields  of  neurons  becomes  even  larger,  ranging  from  30  degrees 
to  100  degrees,  each  responding  to  particular  3D  motion  configurations  [1,  2,  21-24]. 

One  can  easily  envision  an  architecture  that,  using  neurons  with  the  properties  listed 
above,  implements  a  global  decomposition  of  the  motion  field  using  the  signs  of  the  motion 
vectors  along  appropriately  chosen  directions.  Neurons  of  the  first  kind  could  be  involved  in 
the  estimation  of  local  retinal  motion  information;  they  could  be  thought  of  as  computing 
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Figure  14:  The  spatial  structure  of  visual  receptive  fields  and  their  directional  selectivity 
at  different  levels  of  the  motion  pathway  (from  [20]).  The  spatial  scales  of  the  receptive 
fields  (0.1  degree,  etc.)  listed  here  are  for  neurons  at  the  center  of  gaze;  in  the  periphery 
these  dimensions  would  be  larger.  The  polar  diagrams  illustrate  responses  to  variation  in 
the  direction  of  a  bar  target  oriented  at  right  angles  to  its  direction  of  motion.  The  angular 
coordinate  in  the  polar  diagram  indicates  the  direction  of  motion  and  the  radial  coordinate 
the  magnitude  of  the  response. 


whether  the  projection  of  retinal  motion  along  some  direction  is  positive  or  negative.  Neurons 
of  the  second  kind  could  be  involved  in  the  selection  of  local  vectors  in  particular  directions  as 
parts  of  the  various  different  patterns  discussed  in  this  paper,  while  neurons  of  the  third  kind 
could  be  involved  in  computing  the  sign  (positive  or  negative)  of  pattern  vectors  for  areas  in 
the  image;  i.e.,  they  might  compute,  for  image  patches  of  different  sizes,  whether  the  flow  in 
certain  directions  is  positive  or  negative.  Finally,  neurons  of  the  last  kind  (MT  and  MST) 
could  be  the  ones  that  piece  together  the  parts  of  the  patterns  found  already  into  global 
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patterns  that  are  matched  with  prestored  global  patterns.  Matches  provide  information 
about  egomotion  and  mismatches  provide  information  about  independent  motion. 

Results  from  the  cognitive  sciences  have  suggested  that  a  large  part  of  visual  percep¬ 
tion  may  be  realized  as  pattern  matching.  Also,  most  of  the  early  work  in  computational 
vision  was  concerned  with  the  development  of  general  pattern  matching  techniques,  with¬ 
out  paying  much  attention  to  the  nature  of  visual  patterns.  The  body  of  work  conducted 
in  computational  modeling  and  perceptual  disciplines  in  the  last  decades  (see  for  example 
[1,  2,  21-27])  now  provides  the  prerequisites  for  addressing  the  difficult  question  of  what 
patterns  are  relevant  to  particular  visual  tasks. 

The  pattern  matching  techniques  described  here  for  handling  visual  motion  were  based 
on  purely  theoretical  considerations.  It  remains  to  be  shown  if  such  schemes  or  variants 
of  them  are  actually  implemented  in  biological  organisms,  whether  these  are  insects  pos¬ 
sessing  spherical-like  compound  eyes  [28,  29]  or  primates  with  elaborate  and  sophisticated 
motion  processing  capabilities.  Recent  findings  provide  a  good  motivation:  in  area  V4  of 
the  macaque  monkey  neurons  have  been  found  [30]  whose  function  has  been  linked  to  shape 
processing.  These  neurons  respond  to  gratings  of  the  same  nature  as  those  utilized  in  the 
patterns  of  this  paper! 
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